library(ids)
library(tidyverse)
library(popdemo)

Request

Please do not share this document with the students. This is only for your use, either to get a sense of the data or as a crib sheet with information to pass on to students.

Open CMR data

What are open CMR models?

  • “Open” in open CMR means that the population is open to change, therefore we assume that animals can be born and die. We assume (read “we hope and pray”) that immigration and emigration are negligible and can be ignored.

  • Figuring out what the survival probability of a group of animals is, can be hugely valuable for conservation or management, e.g., if your population is crashing, is it because survival has taken a hit?

  • The issue is that to figure out survival, you need to take imperfect detection into account.

    • For instance, did you not see an animal because it was dead, or because it simply didn’t enter your trap that day?
  • There are a variety of open CMR models. The one used in this practical is the “bog standard” one (at least to my mind). The Cormack-Jolly-Seber model (abbreivated to to CJS, named after the authors, though I often mistakenly call it the Colly-Jolly-Seber model).

  • Open CMR models are state-spaced models which allow the estimation of survival rates.

    • A state space model is merely one whereby an animals can “occupy” one of of multiple states. In open CMR models, these states are Alive or Dead.

    • Animals can transition between states (e.g., Sleeping to Moving to Foraging, as in Hidden Markov Chain models [HMMs], or Ocupied to Vacant to Occupied, as in occupancy models), but in open CMR models, the only transition can be from Alive to Dead.

    • Zombies do not exist, so an animal cannot go from Dead to Alive.

Which parameters are estimated?

  • Open CMR models estimate survival probabilities and recapture probabilities.

    • Note that we rarely care about recapture probabilities, they are only included to deal with imperfect detection (though if you are designing a new type of trap, for example, you might care about recapture probabilities).

    • The reason why we only care about recapture probabilities, and not initial capture probabilities, is that you do not care about what the probability was for an animal to enter your trap initially. This does not help you determine if an animals survival. All it tells you is that that particular animal was alive, which you know already because it went into your trap.

    • Instead, you need to know the probability that it re-entered your trap (or you detected it having put a radio-collar on, or however you are doing “trapping”). With this information you can help tease apart if the animal was alive and observed, alive and unobserved, or dead and unobserved.

  • The two parameters are estimated together in two sub-models.

    • The state process model deals with survival, where, if the animal was alive (state = 1), then this is multiplied by the survival probability (assume 87%) to determine if the animal was likely to be alive the following year: \(1 \times 0.87 = 0.87\). If the animal is dead (state = 0), then this resolves to \(0 \times 0.87 = 0\).

    • The observation process model deals with whether or not the animal was recaptured, but this must be conditioned on whether the animal was alive or not. Dead animals have a hard time entering a trap. If the animal was likely to be alive, then the state is multiplied by the recapture probability. Assume the recapture probability is 32%, then if the animal was alive; \(1 \times 0.32 = 0.32\), while if it were dead; \(0 \times 0.32 = 0.32\). This is how open CMR models deal with animals who may or may not be alive, as well as observed or not (so long as it was alive to be observed). It also ensures that that dead animals cannot be captured.

The data

  • As with closed CMR models, open CMR uses capture history (e.g., where 110 represented that an animal was capture on day 1 and day 2, but it was not captured on day 3).

  • Because we only care about recapture, if we have 20 years of capture histories, only 19 years are actually used to estimate recapture, as the first year would represent the initial capture probability. So for a CH of 110, the only information used for recapture probability is -10.

  • Similarly, because survival is the probability to successfully transition from one year to the next (and in doing so survive), if we have 20 years of data, we only have 19 transitions. For a CH of 110, the information used is the first transition, 11, and then the second transition 10.

  • This is why when you biuld models in MARK, your matrices will be shortened by one year for both recapture and survival probabilities.

Simulate the data

Below is the code used to simulate the open CMR data as will be used by the students. Note that the code here is much more complex and involved than the closed CMR simulation, partly because the actual simulation is more complex but also because I am also not a particularly elegant coder. I have tried to comment as much as possible to make it relatively transparent, though there is one section where I can’t remember how I came up with the code. This section falls very much into the “I don’t know why it works but it does, so don’t touch it” (the section where I get the numerical age of the rabbits).

Years and number of captures

# Setting the seed keeps the randomness consistent
set.seed(666)

# How many years of data we will simulate
n_years <- 16

# Year that data collection started
start_year <- 2007

# Number of new individuals in each each age group captured each year
  # Number drawn from a poission distribution (rpois)
  # We generate n_years - 1 values (15 years of new captures)
  # We capture on average 9 one year olds (lambda = 9)
  # Values for lambda informed from a matrix model to get age structure based on survivals
marked_one <- rpois(n = n_years - 1, lambda = 9)
marked_two <- rpois(n = n_years - 1, lambda = 5)
marked_adult <- rpois(n = n_years - 1, lambda = 25)

Mean survival

set.seed(666)

# One year old survival (in normal circumstance)
mean_phi_1 <- 0.49

# Two year old survival (in normal circumstances)
mean_phi_2 <- 0.64

# Adult survival (in normal circumstances)
mean_phi_adult <- 0.82

# survival of rabbits as they age
# Takes into account that animals grow up each year
phi_1 <- c(mean_phi_1, mean_phi_2, rep(mean_phi_adult, n_years - 3))
phi_2 <- c(mean_phi_2, rep(mean_phi_adult, n_years - 2))
phi_adult <- rep(mean_phi_adult, n_years - 1)

The average survival for one year olds is 49%, for two year olds is 64% and 82% for adults. Note that ideally these were all originally 5% higher, but were lowered to ensure observed survivals were somewhat reasonable (i.e., did not lead to explosions of rabbits in the matrix models).

Detection

set.seed(666)

# Mean recapture probability is 60% (note with open CMR, we do not care about initial capture)
p <- rep(0.6, n_years - 1)

Note that detection is kept constant at 60%. I could include some covariate to influence detection, but I’m inclined to leave it as is. Let me know if you have strong feelings on this.

One year olds

# Create a matrix and plug in all of the 60% detection probabilities
P_1 <- matrix(rep(p, sum(marked_one)), 
              ncol = n_years - 1, 
              nrow = sum(marked_one), 
              byrow = TRUE)

Two year olds

P_2 <- matrix(rep(p, sum(marked_two)), 
              ncol = n_years - 1, 
              nrow = sum(marked_two), 
              byrow = TRUE)

Adults

P_A <- matrix(rep(p, sum(marked_adult)), 
              ncol = n_years - 1, 
              nrow = sum(marked_adult), 
              byrow = TRUE)

True survival

Covariates influencing survival

There are three covariates that have an effect on rabbit survival.

  • The first is rabbit age. This was specified earlier in the code (above), but acts as a contrast treatment. One year olds have the lowest, two-year olds have “medium” survival, and adults have the highest.

  • The second is mongoose presence, where mongoose presence lowers survival by 3.5%. This acts as another contrast treatment.

  • The final is distance from human building. This is a conceptually challenging covariate and can be easily misinterpreted by students. Human distance is a proxy variable (i.e., a variable which captures another variable). Here, human distance is meant to be a proxy for cat and dog proximity - both of which are a legitimate threat to the rabbits in real life. So part of the confusion comes from this being a proxy variable. The other part which causes confusion is that humans are getting closer each year - so a decreasing trend in human distance over time. However, the relationship with rabbit survival is positive. Keep in mind, however, that this means that the larger the value for human distance, the larger survival is. You want humans far away (because of their pets), but they are moving closer each year.

When we run this practical, make sure to go over distance from human building in detail with the students and make sure they understand it. Last year, it caused a few of them to get confused. I choose to keep it in this year, because it requires genuine thought to make sense of it.

Mongoose presence is determine as \(logit(p_i) = 4 - 0.38 \times Year_i\), so the probability of mongoose presence declines over the years. This reflects the control of mongoose in the site.

Human distance is a declining trend as well, meaning humans get closer over time, and is generated through the negative cumulative sum of distances from all previous years, with a trend of 0.5 km (which gets inverted in the cumulative sum) and a trend (see the code).

Survival is generated according to:

\(\phi_i \sim binomial(p_i)\)

\(logit(p_i) = \beta_0 + \beta_12YearOld_i + \beta_2Adult_i + \beta_3Mongoose_i + \beta_4Human_i\)

where \(\phi_i\) is whether a rabbit survives or not (\(\phi\) = phi), \(p_i\) is the probability that the rabbit survives, \(\beta_0\) is the global intercept, which defaults to one year old survival here, \(\beta_1\) is the change in survival for two year olds, \(2YearOld_i\) is a dummy variable indicating if rabbit \(i\) is a two year old (1) or not (0), \(\beta_2\) is the change in survival for adults, \(Adult_i\) is a dummy variable for rabbit \(i\) being an adult, \(\beta_3\) is the effect of mongoose on rabbit \(i\), \(Mongoose_i\) is dummy variable indicating whether rabbit \(i\) was exposed to mongoose, \(\beta_4\) is the effect of human distance (and hence pets), and \(Human_i\) is the distance humans were from the site.

The values for the various \(\beta_k\) are included in the code below, though the exact values of these may alter very slightly with any updates.

set.seed(666)
# Mongoose predation
mongoose <- rbinom(n = n_years, size = 1, prob = plogis(4 + -0.38 * 1:n_years))

beta_mongoose <- -0.035

# Nearest human infrastructure (km)
trend <- 0.5                                     # Humans get 0.5 km closer every year
human <- 10 -cumsum(rnorm(n_years, 0.1) + trend) # Distance from the site

beta_human <- 0.006

One year old

# Aiming for one year old survival of 0.52 at the lowest

# Define matrices with survival and recapture probabilities
# One year olds
PHI_1 <- matrix(0,                      # Fill the matrix with zeros initially
                ncol = n_years - 1, # Number of years when rabbits could be recaptured
                nrow = sum(marked_one)) # Total number of marked one year olds

# This line of code fills the matrix with the survival of yearlings in the current year
  # Takes into account that each year there is a new batch of babies, who will then grow up
  # The survival rates here are just the mean survival rates for that age (given one year olds grow into 2 year olds, etc.)
for (i in 1:length(marked_one)) {
  PHI_1[(sum(marked_one[1:i]) - marked_one[i] + 1):sum(marked_one[1:i]), i:(n_years - 1)] <- matrix(rep(phi_1[1:(n_years - i)], 
                                                                                                        marked_one[i]), 
                                                                                                    ncol = n_years - i, 
                                                                                                    byrow = TRUE)
}

# Take the mean survival and adjust them according to mongoose presence and human distance
  # The ifelse is ensuring that we do this only for rabbits alive
for(i in 1:ncol(PHI_1)){
  PHI_1[,i] <- ifelse(PHI_1[,i] != 0, PHI_1[,i] + beta_mongoose * mongoose[i] + beta_human * human[i], PHI_1[,i])
}

Two year old

# Aiming for two year old survival of 0.62 at the lowest

# Code the same as for one year olds
PHI_2 <- matrix(0, 
                ncol = n_years - 1, 
                nrow = sum(marked_two))

for (i in 1:length(marked_two)) {
  PHI_2[(sum(marked_two[1:i]) - marked_two[i] + 1):sum(marked_two[1:i]), i:(n_years - 1)] <- matrix(rep(phi_2[1:(n_years - i)], 
                                                                                                        marked_two[i]), 
                                                                                                    ncol = n_years - i, 
                                                                                                    byrow = TRUE)
}

for(i in 1:ncol(PHI_2)){
  PHI_2[,i] <- ifelse(PHI_2[,i] != 0, PHI_2[,i] + beta_mongoose * mongoose[i] + beta_human * human[i], PHI_2[,i])
}

Adults

# AIming for adult survival of 0.83 at the lowest
PHI_a <- matrix(rep(phi_adult, sum(marked_adult)), 
                ncol = n_years - 1,
                nrow = sum(marked_adult), 
                byrow = TRUE)


for(i in 1:ncol(PHI_a)){
  PHI_a[,i] <- ifelse(PHI_a[,i] != 0, PHI_a[,i] + beta_mongoose * mongoose[i] + beta_human * human[i], PHI_a[,i])
}

CJS simulation function

The base version of this code comes from Olivier Giminez, I think from his online Bayesian CMR course but I may be wrong.

simul.cjs <- function(PHI, P, marked) { # User must provide the survival and recapture matrices, as well as the number of marked animals
  
  # How many years from the PHI matrix, plus 1 year (survival is a transition between 2 years)
  n.occasions <- dim(PHI)[2] + 1
  
  # Create an empty matrix to fill with the CH
  CH <- matrix(0, 
               ncol = n.occasions, 
               nrow = sum(marked))
  
  # Vector with numerical value of when an animal was caught
  mark.occ <- rep(1:length(marked), 
                  marked[1:length(marked)])
  
  # fill the CH matrix
  for(i in 1:sum(marked)) {                     # For each individual in turn
    CH[i, mark.occ[i]] <- 1                     # CH has a 1 assigned when it was caught
    if(mark.occ[i] == n.occasions) next         # If this is the final year of trapping, skip to the next individual
    
    for(t in (mark.occ[i] + 1):n.occasions) {   # For each year an animal could have been alive
      sur <- rbinom(1, 1, PHI[i,t-1])           # See if it survives based on its survival for the proceeding year
      if(sur == 0) break                        # If it died, then stop (don't keep trying to figure out if it survived in later years)
      rp <- rbinom(1, 1, P[i, t-1])             # If it was alive, was it caught? Uses recapture prob to determine
      if(rp == 1) CH[i,t] <- 1                  # Assign a 1 in the capture history
    } # close t loop
  } # close i loop
  return(CH)                                    # Spit out the capture history
} # close function

Simulate capture histories

Use the information from above to generate actual capture histories for the various rabbits.

set.seed(666)

# One year old CH
CH_1 <- simul.cjs(PHI_1, P_1, marked_one)  # individuals marked as yearlings

# Two year old CH
CH_2 <- simul.cjs(PHI_2, P_2, marked_two)  # individuals marked as sub-adults

# Adult CH
CH_A <- simul.cjs(PHI_a, P_A, marked_adult)  # individuals marked as adults 

# Bind the rows of each CH together
CH <- rbind(CH_1, CH_2, CH_A)

# Create vector with occasion of marking
  # I.e., when was an animal first captured
get.first <- function(x) min(which(x!=0))
f_1 <- apply(CH_1, 1, get.first)
f_2 <- apply(CH_2, 1, get.first)
f_a <- apply(CH_A, 1, get.first)

# Create empty matrices X indicating age classes which will be filled later
x_1 <- matrix(NA, ncol = dim(CH_1)[2]-1, nrow = dim(CH_1)[1])
x_2 <- matrix(NA, ncol = dim(CH_2)[2]-1, nrow = dim(CH_2)[1])
x_a <- matrix(NA, ncol = dim(CH_A)[2]-1, nrow = dim(CH_A)[1])

# Yearling data
# This loop distinguishes between adult and non-adult
for(i in 1:nrow(CH_1)){
  for(t in f_1[i]:(ncol(CH_1) - 1)){
    x_1[i, t] <- 3
    x_1[i, f_1[i]] <- 1   
  } #t
} #i

# This loop specifies of those non-adults, which were 1 year olds
for( c in 2:ncol(x_1)){
  x_1[,c]<- ifelse(!is.na(x_1[, c - 1]) & x_1[, c - 1] == 1, 2, x_1[, c])
}

# Adult and non-adult (here, all non-adults are 2 year olds)
for(i in 1:nrow(CH_2)){
  for(t in f_2[i]:(ncol(CH_2) - 1)){
    x_2[i, t] <- 3
    x_2[i, f_2[i]] <- 2   
  } #t
} #i

# Which were adults
for(i in 1:nrow(CH_A)){
  for(t in f_a[i]:(ncol(CH_A)-1)){
    x_a[i, t] <- 3
  } #t
} #i

Store and tidy data

Age

This is age when the rabbit was first captured, to be used mostly to help orientate the students.

set.seed(666)

# Unique animal IDs
rabbit_names <- adjective_animal(n = nrow(CH))

# Starting with age
age <- rbind(x_1, x_2, x_a)

for(i in 1:ncol(age)) {
  age[,i] <- ifelse(age[,i] == 1, 
                    "One year old", 
                    ifelse(age[,i] == 2,
                           "Two year old",
                           ifelse(age[,i] == 3,
                                  "Adult",
                                  age[,i])))
}

age <- as.data.frame(age) 
age$ID <- rabbit_names

seq_names <- seq(start_year, start_year + n_years - 2, 1)

colnames(age)[1:length(seq_names)] <- seq_names

age <- age[, c(16, 1:15)]

age
# write.csv(age, "open_CMR_age_covariate_2022.csv", row.names = FALSE)

Mongoose presence

This is constant for all rabbits. Mongoose were either present, and impacted all rabbits or they were not.

set.seed(666)

mongoose_df <- matrix(NA, 
                      ncol = n_years, 
                      nrow = nrow(CH))

for(i in 1:ncol(mongoose_df)){
  mongoose_df[,i] <- mongoose[i]
}

mongoose_df <- as.data.frame(mongoose_df)

mongoose_df$ID <- rabbit_names

seq_names <- seq(start_year, start_year + n_years, 1)

colnames(mongoose_df)[1:length(seq_names)] <- seq_names

mongoose_df <- mongoose_df[, c(17, 1:16)]

mongoose_df
# write.csv(mongoose_df, "open_CMR_mongoose_covariate.csv", row.names = FALSE)

Distance to humans

Humans getting closer to the site over time, lowering survival. Keep in mind this reflects predation by pets, and maybe the odd human every now and again…

human_df <- matrix(NA, 
                   ncol = n_years, 
                   nrow = nrow(CH))

for(i in 1:nrow(human_df)) {
  human_df[i,] <- human
}

human_df <- as.data.frame(human_df)

human_df$ID <- rabbit_names

seq_names <- seq(start_year, start_year + n_years - 1, 1)

colnames(human_df)[1:length(seq_names)] <- seq_names

human_df <- human_df[, c(17, 1:16)]

human_df
# write.csv(human_df, "open_CMR_human_distance_covariate.csv", row.names = FALSE)

Capture histories

As will be used to create the .INP file for MARK.

CH_df <- as.data.frame(CH)

CH_df$ID <- rabbit_names

seq_names <- seq(start_year, start_year + n_years - 1, 1)

colnames(CH_df)[1:length(seq_names)] <- seq_names

CH_df <- CH_df[, c(17, 1:16)]

CH_df
# write.csv(CH_df, "open_CMR_capture_history.csv", row.names = FALSE)

Summary of the data

Survival over time

The ribbons here are the survival rates I was targetting based on the matrix models. The upper part of the ribbon shows the survival probabilities which results in the population doubling in 25 years. The bottom of the ribbon shows the survival probabilities which results in the population crashing to 50 individuals after 25 years (from an initial 250). The horizontal lines show the initial survival probabilities for each age, before including the effects of the covariates (note that I had to reduce the initial survivals by 5% to get reasonable range of survival probabilities).

df <- data.frame(
  age = c(rep("One", 16), rep("Two", 16), rep("Adult", 16)),
  year = c(2007:2022, 2007:2022, 2007:2022),
  survival = c((mean_phi_1 + beta_mongoose * mongoose + beta_human * human),
               (mean_phi_2 + beta_mongoose * mongoose + beta_human * human),
               (mean_phi_adult + beta_mongoose * mongoose + beta_human * human)),
  mongoose = c(mongoose, mongoose, mongoose),
  human = c(human, human, human)
)

df$age <- factor(df$age, levels = c("One", "Two", "Adult"))

df$mean_phi <- c(rep(mean_phi_1, 16), rep(mean_phi_2, 16), rep(mean_phi_adult, 16))

df$upp_phi <- c(rep(0.55, 16), rep(0.70, 16), rep(0.90, 16)) # population doubles in 25 years, lambda = 1.03
df$low_phi <- c(rep(0.48, 16), rep(0.63, 16), rep(0.81, 16)) # population drops to ~ 50 individuals in 25 years, lambda = 0.92

plotly::ggplotly(ggplot(df) +
                   geom_ribbon(aes(x = year, y = survival, ymax = upp_phi, ymin = low_phi, fill = age), alpha = 0.2) +
                   geom_hline(aes(yintercept = mean_phi, linetype = age, colour = age)) +
                   geom_point(aes(x = year, y = survival, colour = age),
                              size = 2) +
                   geom_path(aes(x = year, y = survival, colour = age)) +
                   scale_y_continuous(labels = scales::percent) +
                   labs(x = "Year",
                        y = "Survival",
                        colour = "Age",
                        fill = "Age",
                        linetype = "Age") +
                   scale_colour_brewer(palette = "Set2") +
                   scale_fill_brewer(palette = "Set2") +
                   theme_bw() +
                   theme_minimal()
)

Summary table of survival

A table summarising the range of true survivals by age and mongoose presence (as percentages). Variation in survival is caused exclusively by age, mongoose presence, and human distance. These are not estimates, they are the actual values.

df$mongoose <- ifelse(df$mongoose == 1, "Present", "Absent")

df %>% 
  group_by(age, mongoose) %>% 
  summarise(mean = round(mean(survival), digit = 3) * 100,
            minimum = round(min(survival), digit = 3) * 100,
            maximum = round(max(survival), digit = 3) * 100)

All variables

Figure below shows impact of all three covariates (age, mongoose presence, and distance to humans). Note here that increasing human distance is good, hence the parameter is positive. Be sure to explain to students that this is bad, because the raw data shows humans getting closer, not further away.

plotly::ggplotly(ggplot(df) +
                   geom_ribbon(aes(x = human, y = survival, ymax = upp_phi, ymin = low_phi, fill = age), alpha = 0.2) +
                   geom_hline(aes(yintercept = mean_phi, linetype = age, colour = age)) +
                   geom_smooth(aes(x = human, y = survival, colour = age),
                               method = "lm") +
                   geom_point(aes(x = human, y = survival, colour = age),
                              size = 2) +
                   scale_y_continuous(labels = scales::percent) +
                   facet_grid( ~ mongoose) +
                   labs(x = "Distance to humans",
                        y = "Survival",
                        colour = "Age",
                        fill = "Age") +
                   scale_colour_brewer(palette = "Set2") +
                   scale_fill_brewer(palette = "Set2") +
                   theme_bw() +
                   theme_minimal()
)

Conditional predictions

One of the main uses of the open CMR models, for the students’ management plans, is to feed these into matrix models (as I’ve done in the figure above). However, if they use the survival probabilities fit from the model alone, then the most they can hope for is a stable population growth.

The reason for this is that survival is hindered by both mongoose and human distance (see the figure above with human distance on the x-axis). Note that the furthest distance humans are from the site, and when mongoose are absent is 8 km, while the furthest distance that humans are when mongoose are present is 10 km. While their model wouldn’t estimate it, when humans are > 8 km away and and mongoose are absence, the survival probabilities of the rabbits leads to the population growing, but to determine this, the students have to be able to use the parameter estimates to reconstruct their equation and use this equation to predict survival probabilities.

Last year, only the most engaged students tried this. So by all means suggest to the students that there may be a way to predict survival probabilities when there is no, or little, influence of both mongoose and humans, but to so subtly. If they are interested in trying to do this, they will need your help to understand. I will have covered some of this in the lecture, but they will have no practical knowledge of it, so won’t truly understand.

To predict a particular situation, such as humans 10 km and no mongoose, they should begin by writing the equation used in the model.

For clarity, if the model were written using the R formula argument, it would be phi ~ age + mongoose + human distance, however this can be misleading as factors/categorical variables (such as age) are actually broken into their constituent levels. The full equation would be:

\(\phi_i \sim binomial(p_i)\)

\(logit(p_i) = \beta_0 + \beta_12YearOld_i + \beta_2Adult_i + \beta_3Mongoose_i + \beta_4Human_i\)

Having run the model, the students should have estimates for the various \(\beta\)s. Here I will use arbitrary values for the parameters, just to demonstrate how the process works.

Assume:

  • \(\beta_0 = 0.4\) (one year old survival)

  • \(\beta_1 = 0.4\) (difference between one year old and two year old survival)

  • \(\beta_2 = 1.0\) (difference between one year old and adult survival)

  • \(\beta_3 = -0.5\) (effect of mongoose)

  • \(\beta_4 = 0.02\) (effect of human distance)

It’s important that the students understand, if they are going to try and do predictions, that age is a contrast treatment (i.e., difference between groups) as well as understanding the logit link. They are taught about the logit link, so may have some rough intuition, but contrast treatments might need some more reinforcement.

The logit link is an internal transformation of probabilities (most often from a binomial distribution, {0,1}) so that a linear relationship can be fit to the data. Trying to fit the data as 1 or 0 (survived or died) is rarely, if ever, possible, and fitting to probabilities can lead to estimates above 100% or below 0%. The logit link resolves this issue and allows straight line fits (i.e., \(y = mx + c\)).

Predicting for each age

One year old survival

For example, assume one year old survival was 60%. The value 60%, or 0.6, is on the response scale, but the model is fit on the link scale. The link scale here is the logit function, which is:

\(logit(\phi) = ln(\frac{\phi}{1-\phi})\)

which if our value of p is 0.6 becomes:

\(logit(0.6) = ln(\frac{0.6}{1-0.6}) = 0.4\)

This means that the \(\beta_0 = 0.4\), is actually a 60% survival probability on the response scale. The parameters are returned, by the model, on the link scale (both in R and in MARK. For more intuition, see the following figure which shows how probabilities (on the y-axis) are related to their logit values (on the x-axis).

p <- seq(from = 0.01, to = 0.99, by = 0.01) # I avoid 0 and 1 as these are -Inf and Inf, which won't plot
logit <- function(p){
  return(log(p/(1-p)))
}
l <- logit(p)

ggplot() +
  geom_line(aes(x = l, y = p)) +
  scale_y_continuous(labels = scales::percent) +
  labs(x = "logit(p)",
       y = "p") +
  theme_bw() +
  theme_minimal()

Two year old survival

To calculate two year old survival, the solution is not to simply do a backtransformation of \(\beta_1 = 0.4\) (as we did for one year olds), because 0.4 is the difference between two year old survival and one year old survival. That means that if the value is positive, then two year old survival is greater than one year old, while if the value were negative, then one years have a higher survival compared to two year olds. The way you calculate two year survival is then:

\(logit(\phi_i) = \beta_0 + \beta_1 \times TwoYearOld_i\)

Note that \(TwoYearOld_i\) is a “dummy” variable, which just means that for any given individual, if they were a two year old, they are assigned a 1 or a 0 if they are not. This happens automatically for users in R, which is nice, but in my experience this automation leads to lots of confusion.

So if we want to predict the mean survival of a two year old rabbit:

\(logit(\phi_i) = \beta_0 + \beta_1 \times TwoYearOld_i = 0.4 + 0.4 \times 1 = 0.8\)

To backtransform this logit value to a probability we use the inverse logit function, which is:

\(\phi_i = \frac{e^p}{(1 + e^p)}\)

where \(e\) is the exponential, and \(p\) is the logit value (I think technically I should have kept using \(\phi\) on the right side, instead of \(p\), but I wanted to make it clear that it’s the logit value).

There is a function to do this in R called plogis():

plogis(0.8)
## [1] 0.6899745

So two year old survival is 69%.

Adult survival

For adults, we do the same process.

\(logit(\phi_i) = \beta_0 + \beta_1 \times TwoYearOld_i + \beta_2 \times Adult_i = 0.4 + 0.4 \times 0 + 1.0 \times 1 = 1.4\)

Which when we backtransform becomes:

plogis(1.4)
## [1] 0.8021839

So adults have an 80% chance to survive.

But note that these predictions of survival are in a vacuum, where neither mongoose nor humans have any effect on survival.

Predicting for mongoose

Mongoose presence works pretty much the same way as age, except that with mongoose there are only two levels; Present and Absent. Again, this is converted to a dummy variable, so present would be 1 and absent would be 0. Because absent is 0, our intercept (one year old survival) and survival of each age that we estimate above, becomes the survival in the absence of mongoose. To figure out how much mongoose does effect survival we would do:

\(logit(\phi_i) = \beta_0 + \beta_1 \times TwoYearOld_i + \beta_2 \times Adult_i + \beta_3 \times Mongoose_i\)

Let’s assume we want to predict the survival of an adult in the presence of mongoose. Keep in mind that the effect of mongoose is negative. To predict this, we’d do:

\(logit(\phi_i) = \beta_0 + \beta_1 \times TwoYearOld_i + \beta_2 \times Adult_i + \beta_3 \times Mongoose_i = 0.4 + 0.4 \times 0 + 1.0 \times 1 -0.5 \times 1 = 0.9\)

Which when we plug this into the plogis function (or do it by hand if you prefer), you get:

plogis(0.9)
## [1] 0.7109495

Which means that adults, in the presence of mongoose, have a survival probability of 71% - a decrease of 9%.

Predicting for human distance

Human distance is probably the easiest parameter to work with. Because human distance is a continuous variables it means we do not need to use any dummy variables. We simply use the raw numbers. Assume we want to predict survival for two year olds, in the presence of mongoose, with humans 10.32 km away:

\(logit(\phi_i) = \beta_0 + \beta_1 \times TwoYearOld_i + \beta_2 \times Adult_i + \beta_3 \times Mongoose_i + \beta_4 \times Human_i = 0.4 + 0.4 \times 1 + 1.0 \times 0 -0.5 \times 1 + 0.02 \times 10.32 = 0.5\)

Again, plugging this into plogis we get:

plogis(0.5)
## [1] 0.6224593

A 62% chance to survive for two year olds, when there is a mongoose present and humans are 10.32 km away. If humans were right next to the rabbit \(Human = 0\), then you’d have a 57% chance of survival.

Student predictions

For students trying to figure out what the maximum survival probabilities would be if everything was “perfect” (i.e., no mongoose and humans 10 km away), they would do:

One year olds

\(logit(\phi_1) = \beta_0 + \beta_1 \times TwoYearOld + \beta_2 \times Adult + \beta_3 \times Mongoose + \beta_4 \times Human = 0.4 + 0.4 \times 0 + 1.0 \times 0 -0.5 \times 0 + 0.02 \times 10 = 0.6\)

Which, with the arbitrary parameters used here would be 65%.

Two year olds

\(logit(\phi_2) = \beta_0 + \beta_1 \times TwoYearOld + \beta_2 \times Adult + \beta_3 \times Mongoose + \beta_4 \times Human = 0.4 + 0.4 \times 1 + 1.0 \times 0 -0.5 \times 0 + 0.02 \times 10 = 1\)

Which, with the arbitrary parameters used here would be 73%.

Adults

\(logit(\phi_2) = \beta_0 + \beta_1 \times TwoYearOld + \beta_2 \times Adult + \beta_3 \times Mongoose + \beta_4 \times Human = 0.4 + 0.4 \times 0 + 1.0 \times 1 -0.5 \times 0 + 0.02 \times 10 = 1.6\)

Which, with the arbitrary parameters used here would be 83%.